A Novel Approach Towards a Comprehensive Consensus Representation of the Expressed Human Genome

نویسندگان

  • Winston Hide
  • John Burke
  • Robert Miller
چکیده

In order to provided a novel maximised approach to the generation of accurate, comprehensive, consensus sequences of the expressed human genome, we have developed and produced a system for a novel-representation, broad gene coverage, consensus database of expressed human gene fragments (ESTs). To perform clustering of ESTs, we have developed and employed D2-cluster, an algorithm based on the d2-search algorithm (Hide et al. 1994) speci cally for EST clustering. D2-cluster does not require alignment in order to perform clustering (Burke, Davison and Hide, in prep). We have incorporated d2-cluster into a portable and novel system to perform clustering, alignment and automated error analysis of publicly available expressed sequence tags (STACK PACK). The system includes a statistically robust algorithm that can detect and compensate for error within an aligned cluster of ESTs. We have manufactured a database of partial human consensus sequences from 552 013 ESTs from dbEST 040896 and TIGR. The database is termed Sequence Tag Alignment and Consensus Knowledgebase (STACK). STACK 1.0 contains 18 divisions based on tissue annotation identifying 204 431 unique sequences and generating 76 131 consensi which represent 321 134 ESTs. The consensus sequences have an average length of 497 bases, a 39% increase over the 357 base average length of the input data set. Clone Ids are used to join 92 759 unique sequences and 48 858 consensi into 61 632 linked sequences, averaging 900 bases each. The distribution of clusters compares favourably with UniGene, re ecting the di erence in methodology of clustering and the higher input number of sequences into STACK. SANIGENE high accuracy database is also generated, consisting of sequences which agree in at least two ESTs. STACK is a distributable, core information resource upon which a comprehensive knowledgebase can be built.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

I-20: Towards The Transparent Embryo: Dynamics and Ethics of Comprehensive Preimplantation Genetic Screening

Background: To study the ethical aspects of comprehensive preimplantation genetic screening (PGS) through microarrays and whole genome sequencing Materials and Methods: In order to pinpoint ethical issues regarding comprehensive embryo screening we have first investigated the technical and moral issues by organizing a campus meeting with experts and by a literature study. Subsequently we have i...

متن کامل

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

P-215: Discovery of A Novel APA Variant of A Human Potential Gene Based on Expressed Sequenced Tags Analysis

Background: Expressed sequence tags (ESTs) are sequences of cDNA fragments prepared from different tissue sources. There are over one million of these sequences in the publicly available database, and these sequences are believed to represent more than half of all human genes. The ESTs belong to different cDNA libraries, was prepared from one particular cell type, organ, or tumor. Therefore, th...

متن کامل

Gender Representation of Emotions in the Novel A Hero of Our Time by Mikhail Lermontov

The article deals with emotions represented through images of the characters of M.Y. Lermontov’s novel A Hero of Our Time. The author consecutively analyzes elements of the text, in which emotions of male and female characters are nominated, directly expressed and described. The number of lexical units and text elements involved in the representation of a particular emotion is recorded in...

متن کامل

A Simple Genome Walking Strategy to Isolate Unknown Genomic Regions Using Long Primer and RAPD Primer

Background: Genome walking is a DNA-cloning methodology that is used to isolate unknown genomic regions adjacent to known sequences. However, the existing genome-walking methods have their own limitations. Objectives: Our aim was to provide a simple and efficient genome-walking technology. Material and Methods: In this paper, we dev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997